target - get gene specific tracks (bedgraph, wig, etc)
what would be nice is a bed with every CG, methylated and non-meth in sep files, with loci redundancy
Maybe not ideal but a start
----
files to start with
---
lets move to a individual gene basis
New Fasta from gene gff?
mRNA gff…..
Tool: bedtools getfasta (aka fastaFromBed)
Version: v2.17.0
Summary: Extract DNA sequences into a fasta file based on feature coordinates.
Usage: bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf> -fo <fasta>
Options:
-fi Input FASTA file
-bed BED/GFF/VCF file of ranges to extract from -fi
-fo Output file (can be FASTA or TAB-delimited)
-name Use the name field for the FASTA header
-split given BED12 fmt., extract and concatenate the sequencesfrom the BED "blocks" (e.g., exons)
-tab Write output in TAB delimited format.
- Default is FASTA format.
-s Force strandedness. If the feature occupies the antisense,
strand, the sequence will be reverse complemented.
- By default, strand information is ignored.
robertsmac:bin sr320$ ./fastaFromBed -fi /Volumes/web/cnidarian/oyster.v9.fa -bed /Users/sr320/Desktop/oyster.v9.glean.final.rename.mRNA.gff -fo /Volumes/web/cnidarian/TJGR_genomic_gene.fasta
./fastaFromBed -name -fi /Volumes/web/cnidarian/oyster.v9.fa -bed /Users/sr320/Desktop/PLEASE.gff -fo /Volumes/web/cnidarian/TJGR_genomic_gene.fasta
Now lets run BSMAP just on the this
./bsmap -a //Volumes/NGS\ Drive/NGS\ Raw\ Data/Oyster_BSseq/filtered_Unlabeled_NoIndex_L003_R1.fastq.gz -d /Volumes/web/cnidarian/TJGR_genomic_gene.fasta -o /Volumes/web/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v1.sam -p 8
Input reference file: /Volumes/web-1/cnidarian/TJGR_genomic_gene.fa (format: FASTA)
Load in 28027 db seqs,
total_kmers: 43046721
Create seed table. 26 secs passed
max number of mismatches: read_length * 8% max gap size: 0
kmer cut-off ratio:5e-07
max multi-hits: 100 max Ns: 5 seed size: 16 index interval: 4
quality cutoff: 0 base quality char: '!'
min fragment size:28 max fragemt size:500
start from read #1 end at read #4294967295
additional alignment: T in reads => C in reference
mapping strand: ++,-+
Single-end alignment(8 threads)
Input read file: //Volumes/NGS Drive/NGS Raw Data/Oyster_BSseq/filtered_Unlabeled_NoIndex_L003_R1.fastq.gz (format: gzipped FASTQ)
Output file: /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v1.sam (format: SAM) size 226602635 bp. 8 secs passed
Total number of aligned reads: 93661017 (67%)
Done.
Finished at Fri May 3 03:10:08 2013
Total time consumed: 51264 secs
methratio
python
methratio.py -d /Volumes/web-1/cnidarian/TJGR_genomic_gene.fa -u -z -g -o /Volumes/web-1/cnidarian/BiGill_methratio_gene_geno_A.txt -s /Volumes/Bay3/Software/BSMAP/bsmap-2.73/samtools /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v1.sam
@ Fri May 3 06:33:51 2013: reading reference /Volumes/web-1/cnidarian/TJGR_genomic_gene.fa ...
@ Fri May 3 06:34:05 2013: reading /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v1.sam ...
[samopen] SAM header is present: 28027 sequences.
running on hummingbird..
[sam_read1] reference 'ZS:Z:-+' is recognized as '*'.
Parse error at line 59068474: invalid CIGAR character
@ Fri May 3 07:00:45 2013: combining CpG methylation from both strands ...
@ Fri May 3 07:01:02 2013: writing /Volumes/web-1/cnidarian/BiGill_methratio_gene_geno_A.txt ...
@ Fri May 3 07:13:35 2013: done.
total 35267328 valid mappings, 42064194 covered cytosines, average coverage: 5.02 fold.
Getting error in methratio error so lets try bsmap with BAM output
./bsmap -a /Volumes/NGS\ Drive/NGS\ Raw\ Data/Oyster_BSseq/filtered_Unlabeled_NoIndex_L003_R1.fastq.gz -d /Volumes/web-1/cnidarian/TJGR_genomic_gene.fasta -o /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v2.bam -p 10
Input reference file: /Volumes/web-1/cnidarian/TJGR_genomic_gene.fasta (format: FASTA)
Load in 28027 db seqs, total size 226602635 bp. 7 secs passed
total_kmers: 43046721
Create seed table. 17 secs passed
max number of mismatches: read_length * 8% max gap size: 0
kmer cut-off ratio:5e-07
max multi-hits: 100 max Ns: 5 seed size: 16 index interval: 4
quality cutoff: 0 base quality char: '!'
min fragment size:28 max fragemt size:500
start from read #1 end at read #4294967295
additional alignment: T in reads => C in reference
mapping strand: ++,-+
Single-end alignment(10 threads)
Input read file: /Volumes/NGS Drive/NGS Raw Data/Oyster_BSseq/filtered_Unlabeled_NoIndex_L003_R1.fastq.gz (format: gzipped FASTQ)
Output file: /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v2.bam (format: SAM, automatically convert to BAM)
-hummingbird
Total number of aligned reads: 93661017 (67%)
Done.
Finished at Fri May 3 15:07:33 2013
Total time consumed: 24661 secs
---
try the BSP format
hummingbird
d-128-95-149-219:bsmap-2.73 sr320$ ./bsmap -a /Volumes/NGS\ Drive/NGS\ Raw\ Data/Oyster_BSseq/filtered_Unlabeled_NoIndex_L003_R1.fastq.gz -d /Volumes/web-1/cnidarian/TJGR_genomic_gene.fasta -o /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v3.BSP -p 3
RUNNING
--
ALSO lets try SAM with uncompressed
./bsmap -a /Volumes/NGS\ Drive/NGS\ Raw\ Data/Oyster_BSseq/filtered_Unlabeled_NoIndex_L003_R1.fastq -d /Volumes/web-1/cnidarian/TJGR_genomic_gene.fasta -o /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v4.sam -p 6
---
now methratio on each….
python
methratio.py -d /Volumes/web-1/cnidarian/TJGR_genomic_gene.fa -u -z -g -o /Volumes/web-1/cnidarian/BiGill_methratio_gene_geno_C.txt -s /Volumes/Bay3/Software/BSMAP/bsmap-2.73/samtools /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v3.BSP
d-128-95-149-219:bsmap-2.73 sr320$ python methratio.py -d /Volumes/web-1/cnidarian/TJGR_genomic_gene.fa -u -z -g -o /Volumes/web-1/cnidarian/BiGill_methratio_gene_geno_C.txt -s /Volumes/Bay3/Software/BSMAP/bsmap-2.73/samtools /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v3.BSP
@ Sun May 5 06:49:42 2013: reading reference /Volumes/web-1/cnidarian/TJGR_genomic_gene.fa ...
@ Sun May 5 06:49:56 2013: reading /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v3.BSP ...
@ Sun May 5 06:54:44 2013: read 10000000 lines
@ Sun May 5 06:59:33 2013: read 20000000 lines
@ Sun May 5 07:04:21 2013: read 30000000 lines
@ Sun May 5 07:09:09 2013: read 40000000 lines
@ Sun May 5 07:13:59 2013: read 50000000 lines
@ Sun May 5 07:18:47 2013: read 60000000 lines
@ Sun May 5 07:23:39 2013: read 70000000 lines
@ Sun May 5 07:28:33 2013: read 80000000 lines
@ Sun May 5 07:33:25 2013: read 90000000 lines
@ Sun May 5 07:35:13 2013: combining CpG methylation from both strands ...
@ Sun May 5 07:35:31 2013: writing /Volumes/web-1/cnidarian/BiGill_methratio_gene_geno_C.txt ...
@ Sun May 5 07:50:00 2013: done.
total 56211463 valid mappings, 48430264 covered cytosines, average coverage: 6.92 fold.
d-128-95-149-219:bsmap-2.73 sr320$
python
methratio.py -d /Volumes/web-1/cnidarian/TJGR_genomic_gene.fa -u -z -g -o /Volumes/web-1/cnidarian/BiGill_methratio_gene_geno_D.txt -s /Volumes/Bay3/Software/BSMAP/bsmap-2.73/samtools /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v4.sam
d-128-95-149-219:bsmap-2.73 sr320$ python methratio.py -d /Volumes/web-1/cnidarian/TJGR_genomic_gene.fa -u -z -g -o /Volumes/web-1/cnidarian/BiGill_methratio_gene_geno_D.txt -s /Volumes/Bay3/Software/BSMAP/bsmap-2.73/samtools /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v4.sam
@ Sun May 5 06:50:32 2013: reading reference /Volumes/web-1/cnidarian/TJGR_genomic_gene.fa ...
@ Sun May 5 06:50:44 2013: reading /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v4.sam ...
[samopen] SAM header is present: 28027 sequences.
@ Sun May 5 06:55:21 2013: read 10000000 lines
@ Sun May 5 06:59:57 2013: read 20000000 lines
@ Sun May 5 07:04:33 2013: read 30000000 lines
@ Sun May 5 07:09:09 2013: read 40000000 lines
@ Sun May 5 07:13:47 2013: read 50000000 lines
@ Sun May 5 07:18:24 2013: read 60000000 lines
@ Sun May 5 07:23:04 2013: read 70000000 lines
@ Sun May 5 07:27:45 2013: read 80000000 lines
@ Sun May 5 07:32:26 2013: read 90000000 lines
@ Sun May 5 07:34:09 2013: combining CpG methylation from both strands ...
@ Sun May 5 07:34:27 2013: writing /Volumes/web-1/cnidarian/BiGill_methratio_gene_geno_D.txt ...
@ Sun May 5 07:49:08 2013: done.
total 56211459 valid mappings, 48430256 covered cytosines, average coverage: 6.92 fold.
d-128-95-149-219:bsmap-2.73 sr320$